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Abstract — Mobile object tracking has an important role 
in the computer vision applications. In this paper, we use 
a tracked target-based taxonomy to present the object 
tracking algorithms. The tracked targets are divided into 
three categories: points of interest, appearance and silhouette 
of mobile objects. Advantages and limitations of the tracking 
approaches are also analyzed to find the future directions in 
the object tracking domain. 

Keywords -Object tracking; Computer vision; Video 
surveillance 

I. Introduction 

Nowadays video surveillanee systems are installed 
worldwide in many different sites such as airports, hospi- 
tals, banks, railway stations and even at home (see figure 
[T). The surveillance cameras help a supervisor to oversee 
many different areas from the same room and to quickly 
focus on abnormal events taking place in the controlled 
space. However one question arises: how can a security 
officer analyse and simultaneously dozens of monitors 
with a minimum rate of missing abnormal events (see 
figure |2]l in real time? Moreover, the observation of many 
screens for a long period of time becomes tedious and 
draws the supervisor's attention away from the events of 
interest. The solution to this issue lies in three words: 
intelligent video monitoring. 

The term "intelligent video monitoring" expresses a 
fairly large research direction that is applied in different 
fields: for example in robotics and home-care. In particu- 
lar, a lot of researches and works are already achieved 
in video surveillance applications. Figure [3] presents a 
processing chain of a video interpretation system for action 
recognition. Such a chain includes generally different 
tasks: image acquisition, object detection, object classifi- 
cation, object tracking and activity recognition. This paper 
studies the mobile object tracking task. 

The aim of an object tracking algorithm is to generate 
the trajectories of objects over time by locating their 
positions in every frame of video. An object tracker may 
also provide the complete region in the image that is 
occupied by the object at every time instant. Mobile 
object tracking has an important role in the computer 
vision applications such as home care, sport scene analysis 
and video surveillance-based security systems (e.g. in 
bank, parking, airport). In term of vision tasks, the object 
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Figure 1. Illustration of some areas monitored by video cameras 
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Figure 2. A control room for video surveillance (source 1301 ) 



tracking task provides object trajectories for several tasks 
such as activity recognition, learning of interest zones or 
paths in a scene and detection of events of interest. 

In this paper, we present a classification of tracking 
algorithms which is based on tracked target categories 
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Figure 3. Illustration of a video interpretation system. The first row 
presents the task names. The second row presents result illustrations of 
corresponding tasks. 



Il37l . For each tracker category, we present some typical 
approaches, their advantages as well as their limitations. 
This paper is organized as follows. Section III] presents 
the overview of the presented tracker taxonomy. Sections 
|inl |IV] and |V] present in detail each tracker category. A 



conclusion is presented at section VI 



II. 



Overview of Tracking Algorithm 
Classification 



The tracking algorithms can be classified by different 
criteria. In [241, based on the techniques used for track- 
ing, the author divides the trackers into two categories: 
the model-based and feature-based approaches. While a 
model-based approach needs the model for each tracked 
object (e.g. color model or contour model), the second 
approach uses visual features such as Histogram of Ori- 
ented Gradients (HOG) |16|, Haar |33| features to track 
the detected objects. In |2j, the tracking algorithms are 
classified into three approaches: appearance model-based, 
geometry model-based and probability-based approaches. 
The authors in H] divide the people tracking algorithms 
into two approaches: using human body parts and without 
using human body parts. 

In this paper, we present the object tracking classifica- 
tion proposed by f37l because this classification represents 
clearly and quite completely the tracking methods existing 
in the state of the art. This taxonomy method relies on 
the "tracked targets". The tracked targets can be points of 
interest, appearance or silhouette of mobile object. Corre- 
sponding to these target types, three approach categories 
for object tracking are determined: point tracking, appear- 
ance tracking and silhouette tracking. Figure fflpresents the 
taxonomy of tracking methods proposed by this paper 

• Point tracking: The detected objects are represented 
by points, and the tracking of these points is based 
on the previous object states which can include 
object positions and motion. An example of object 
correspondence is shown in figure |5ja). 

• Appearance tracking (called "kernel tracking" in 
ll37l ): The object appearance can be for example a 
rectangular template or an elliptical shape with an 
associated RGB color histogram. Objects are tracked 
by considering the coherence of their appearances 
in consecutive frames (see example in figure |5jb)). 
This motion is usually in the form of a parametric 
transformation such as a translation, a rotation or an 
affinity. 

• Silhouette tracking: The tracking is performed by 
estimating the object region in each frame. Silhouette 
tracking methods use the information encoded inside 
the object region. This information can be in the 
form of appearance density and shape models which 
are usually in the form of edge maps. Given the 
object models, silhouettes are tracked by either shape 
matching or contour evolution (see figures |5jc), (d)). 



III. Point Tracking 
A. Deterministic Approaches 

According to f35l, a deterministic system is a system 
in which no randomness is involved in the development 
of the future states of the system. A deterministic model 
thus always produces the same output from a given starting 
condition or initial state. In order to apply this idea for ob- 
ject tracking, the object movements are generally assumed 
to follow some trajectory prototypes. These prototypes can 
be learned offline, online, or constructed based on a scene 
model. We can find in the state of the art many tracking 
algorithms based on this idea [29], fSl, lIU. 

In 1 29 J , the authors present a method to learn offline 
some tracking parameters using ground-truth data. In the 
offline phase, the authors define an energy function to 
compute the correctness of the people trajectories. This 
function is denoted E(xt) where Xt is a 2D vector 
containing the pedestrian's location at time t. 

The authors assume that a pedestrian path is constrained 
by the four following rules. Each rule is represented by 
an energy function. 

1) The displacement distance of people between two 
consecutive frames is not too large. The energy 
function expressing this rule is denoted Ei{xt)- 

2) The speed and direction of people movement should 
be constant. The energy function corresponding to 
this rule is denoted E2{xt). 

3) People movements should reach their destinations. 
The energy function representing this rule is denoted 
E^{xt). 

4) People movements intend to avoid people in the 
scene. The energy function of this rule is denoted 
Ei{xt). 

The complete energy E{x) is a weighted combination 
of these components: 



E{xt) =Y,^,E,{xt) 



(1) 



j=i 



where Oi represents the weight of the energy function i. 
This complete energy function is used to predict the pedes- 
trian locations in the next frame. The pedestrians should 
move to the locations that minimize this energy. The 
objective of the training phase is to learn the values 6i that 
make the predicted pedestrian tracks match corresponding 
tracks in the ground-truth data. To accomplish this, the 
authors define a loss function L{x*,g) that measures the 
difference between a predicted track x* and the ground- 
truth track g as follows: 
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(2) 



where xt and gt are locations of predicted track and 
ground-truth track at time t, and Ng is the number of 
positions of the considered track. The learned values Oi 
are used later in the testing phase to predict pedestrian tra- 
jectories. Figure [6] shows some examples of the predicted 
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Figure 4. Taxonomy of tracking methods (adapted from 1371 ). 
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Figure 5. Illustration of different tracking approaches, (a) Multipoint correspondence, (b) Parametric transformation of a rectangular patch, (c, d) 
Two examples of silhouette matching. (Source 1371 '). 



paths (in red color) and their corresponding reference paths 
(in black color). 

The advantage of this approach is that its performance 
does not depend on quality of the object detection process. 
However the used rules can be incorrect for complex 
people movements. The pedestrian destination can be 
changed. Obstacles are often not neither stable throughout 
the time. Pedestrian velocity is only correct if he/she is 
always detected correctly. Experimentation is only done 
with simple sequences. 

In [SI, the authors present a tracking algorithm based 
on a HOG descriptor [16|. First, the FAST algorithm 
ll27l is used to detect the points of interest. Each point 
is associated with a HOG descriptor (including gradient 
magnitude and gradient orientation). The authors compute 
the similarity of the HOG points located in the consecutive 
frames to determine the couples of matched points. The 
object movements can be determined using the trajectories 
of their points of interest (see figure ITli. In the case of 
occlusion, the authors compare the direction, speed and 
displacement distance of the point trajectories of occluded 
objects with those of objects in previous frames to split 
the bounding box of occluded objects (see figure [8]). This 
approach can be well performed in the case of occlusions 
in which object appearance is not fully visible. However, 
the HOG descriptor reliability decreases significantly if the 
contrast between the considered object and its background 
is low. 

B. Probabilistic Approaches 

Probabilistic approaches represent a set of object track- 




(c) Frame: 27. (d) Frame: 37. 

Figure 7. Illustration of a point tracker for KTH dataset (source L8J). 

ments. In this approach, the tracked objects are represented 
as one or many points. One of the most popular methods of 
this approach is Kalman filter-based tracking. A Kalman 
filter is essentially a set of recursive equations that can 
help to model and estimate the movement of a linear 
dynamic system. Kalman filtering is composed of two 
steps: prediction and correction. The prediction step uses 
the state model to predict the new state of variables: 



Xi = DXt, 
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(3) 



(4) 



where Xf. and X^_^ are respectively the predicted and 



ing methods which rely on the probability of object move- corrected states at time t and t — 1; Pj and Pi_i are 






Figure 6. Examples of pedestrian paths, shown in black, and predicted paths, shown in red. The model accurately predicts the deflection of pedestrians 
due to oncoming obstacles (source 1291 ). 




a. Frame 12 b. Frame 40 c. Frame 41 

Figure 8. Illustration of split a merged-object bounding box for a synthetic video sequence: a. Before merging b. Merging c. Split (source [8j). 
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Figure 9. Illustration of Kalman filter steps (source 1201 ) 



respectively the predicted and corrected covariances at 
time t and i — 1. Z? is the state transition matrix which 
defines the relation between the state variables at time t 
and t — 1, W is a noise matrix, Q is the covariance of the 
noise W. Similarly, the correction step uses the current 
observations Zt to update the object's state: 
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where M is the measurement prediction matrix, K is the 
Kalman gain and R is the covariance matrix of noise in 
measurement. An illustration of the Kalman filter steps 
can be found in figure [9] The Kalman filter is widely used 
in the vision community for tracking f7\, f9\, fW\. 
In ll26ll . the authors present a tracking algorithm for 



vehicles during the night time (see figure 10 1. In this work, 
vehicles are detected and tracked based on their headlight 
pairs. Assuming that the routes are linear, a Kalman filter 




Figure 10. Illustration of a night time traffic surveillance system (source 

ED). 



is used to predict the movement of the headlights. When 
a vehicle turns, its Kalman filter is re-initialized. 

In lfT2l . the authors present an algorithm to track mobile 
objects in different scene conditions (see illustration in fig- 
ure [TT]i. The main idea of this tracker includes estimation, 
multi-features similarity measures and trajectory filtering. 
A feature set (distance, area, shape ratio, color histogram) 
is defined for each tracked object to search for the best 
matching object. Its best matching object and its state 
estimated by the Kalman filter are combined to update 
position and size of the tracked object. However, the 
mobile object trajectories are usually fragmented because 
of occlusions and misdetections. Therefore, the authors 
also propose a trajectory filtering, named global tracker, 
aims at removing the noisy trajectories and fusing the 
fragmented trajectories belonging to a same mobile object. 

Because the Kalman filter assumes that the variation of 
the considered variables draws from a Gaussian distribu- 
tion, these approaches can be only applied for tracking 
objects with linear movements, or with movements of 
simple variations of direction, speed. In order to overcome 




Figure 1 1 . Illustration of tracking algorithm output for TRECVid video 
(source 1121 ^ 



these limitations, an Extended Kalman filter [6J or particle 
filter f3l can be used. 

IV. Appearance Tracking 

Appearance tracking is performed by computing the 
motion of the object, which is represented by a primitive 
object region, from one frame to the next. The track- 
ing methods belonging to this type of approaches are 
divided into two sub-categories: single view-based (called 
template-based in ifJTl ) and multi view-based. 

A. Single View-based Approaches 

This approach category is widely studied in the state 
of the art for tracking mobile objects in a single camera 
view. Many methods have been proposed to describe the 
object appearance. In ||3TI . the authors present a people 
detection and a tracking algorithm using Haar |33| and 
Local Binary Pattern (LBP) |25| features combined with 
an online boosting (see figure [T2]l. The main idea is to use 
these features to describe the shape, the appearance and the 
texture of objects. While Haar features encode the generic 
shape of the object, LBP features capture local and small 
texture details, thus having more discriminative capability. 
First, the image is divided into cells and the Haar features 
are applied in each cell to detect people. Each detected 
person is divided into a grid of 2 x 3 blocks. Each block 
is divided in 9 sub-regions. For each region, the pixel grey 
values are used to apply the 8-neighbours LBP calculus 



scheme (see figure 13 1. The LBP features are then used 
to track people. 

Both classifiers (Haar and LBP) are combined with an 
online boosting |19|. The application of these two features 
in each cell (for the Haar features) or in each region (for 
the LBP features) is considered as the weak classifiers. 
These weak classifiers cluster samples by assuming a 
Gaussian distribution of the considered features. This 
online boosting scheme can help the system to adapt to 
specific problems which can take place during the online 
process (e.g. change of lighting conditions, occlusion). 
However, the online training is time consuming. Also, the 
authors do not explain clearly enough how to determine 
positive or negative samples in this training. It seems that 




Figure 12. A Haar-like features classifier is employed as a generic 
detector, while an online LBP features recognizer is instantiated for each 
detected object in order to learn its specific texture (source 1311). 
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Figure 13. Illustration of the LBP features computation (source 131!) 



the system has to learn in a sequence in which there is 
only one person before handling complex detection and 
tracking cases. The tested sequences are still simple (e.g 
few people in the scene, simple occlusion). 

In 1 39 1, the authors present a method to detect occlusion 
and track people movements using the Bayesian decision 
theory. Mobile object appearance is characterized by color 
intensity and color histogram. For each object pair de- 
tected in two consecutive frames, if the similarity score is 
higher than a threshold, these two objects are considered as 
matched and their templates are updated. If the matching 
score is lower than this threshold, the authors assume that 
an occlusion occurs. A mobile object is divided into sub- 
parts and the similarity scores are computed for these 
parts. If the matching score of one object part is high 
enough while the other ones are low, an occlusion is 
detected. The mobile object can be still tracked but its 
template is not updated. This paper proposes a method to 
detect and track objects in occlusion cases. The authors 
define a mechanism to distinguish between an object 
appearance change due to occlusion or a real change 
(e.g due to the change of scene illumination or object 
distance to camera location). However the features used 
for characterizing the object appearance ( i.e. intensity 
and color histogram) are not reliable enough in the case 
of poor lighting condition or weak contrast. The tested 
video sequences are not complex enough to prove the 
effectiveness of this approach. 

In 1,1 IJ . the authors propose a tracking algorithm whose 
parameters can be learned offline for each tracking con- 
text. A feature pool is used to compute the matching 
score between two detected objects. This feature pool 
includes 2D, 3D displacement distances, 2D sizes, color 
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Figure 14. The graph representing the established links of the detected 
objects in a temporal window of size T2 frames (source llll ) 



(a) original image (b) upper body pait (c) lower body part 

Figure 16. The dominant color separation technique: a) original image; 
b) upper body dominant color mask; c) lower body dominant color mask 
(source L5J). 



histogram, histogram of oriented gradient (HOG), color 
covariance and dominant color. An offline learning process 
is proposed to search for useful features and to estimate 
their weights for each context. In the online tracking 
process, a temporal window is defined to establish the 
links between the detected objects (see figure 14 1. This 



enables to find the object trajectories even if the objects are 
misdetected in some frames. A trajectory filter is proposed 
to remove noisy trajectories. However the authors suppose 
that the context within a video sequence is fixed over time. 
Moreover, the tracking context is manually selected. 

B. Multi-view Approach 

The methods belonging to this type of approaches 
are applied for tracking objects in a multi-video camera 
system. Because cameras can be different in rendering 
of colors or illumination, a color normalization step is 
usually necessary to make comparable object colors from 
different cameras (e.g. use grey intensity or compute mean 
and standard deviation values of color distributions). 

In [23 1, the authors present an appearance model to 
describe people in a multi-camera system. For each de- 
tected object, its color space is reduced using a mean- 
shift-based approach proposed in llT4l . Therefore, the color 
texture of the observed object is reduced to a small 
number of homogeneous colored body segments. For each 
color segment, the area (in pixels) and the centroid are 
calculated and segments smaller than 5% of the body area 
are removed. Figure [15] illustrates the steps of this object 
appearance computation. Finally, for approximative spatial 
description of the detected person, the person is subdivided 
in three sections as follows: starting from the bottom, the 
first 55% as lower body, the next 30% as upper body, and 
the remaining 15% as head. The appearance descriptor is 
now composed, by assigning the color segments to the 
corresponding body part by its centroids. Identical colors, 
which belong to the same body part, are merged to one. 
In doing so, the spatial relationships within a body part 
are lost but at the same time this approach leads to an 
invariant representation of the object in different camera 
views. 



Let (7 = (ci, C2, ..., c„) be the collection of all n color 
segments, with Cd = [Ld,Ud,Vd,Wd,bpd]'^ , 
where 

* d = l..n. 

* L, u are the chromatic values and v is the luminance 
value of the homogeneous segment in CIE Lx ux v 
color space 134|. 

» w E {0..1} (weight) is the area fraction of the color 
segment relative to the object area. 

* bp = {head, upperbody, lowerbody} is the body 
part index which the centroid of the segment belongs 
to. 

The appearance feature set is defined by F°'Pp C C, 
with F'^PP is the subset of the color segments, with a 
body part related fraction (wd) higher than a minimum 
weight (e.g. 10%). For the similarity calculation of two 
appearance feature sets F^ and F^ , the Earth Mover's 
Distance (HMD) EH) is used. 

In 15 1, the authors define a signature for identifying 
people over a multi-camera system. This method studies 
the Haar and dominant color features. For each single 
camera, the authors adapt the HOG-based technique used 
in ifTSJI to detect and track people. The detection algorithm 
extracts the histograms of gradient orientation, using a 
Sobel convolution kernel, in a multi-resolution framework 
to detect human shapes at different scales. With Haar 
features, the authors use Adaboost |18| to select the most 
discriminative feature set for each individual. This feature 
set forms a strong classifier The main idea of dominant 
color feature is to select the most significant colors to 
characterize the person signature. The human body is 
separated into two parts: the upper body part and the lower 
body part. The separation is obtained by maximizing the 
distance between the sets of dominant colors of the upper 



and the lower body (see figure 16 1. The combination of 



the dominant color descriptors of upper and lower body is 
considered as a meaningful feature to discriminate people. 
An Adaboost scheme is applied to find out the most 
discriminative appearance model. 

In 1 13 1, the authors compare and evaluate three ap- 
pearance descriptors which are used for estimating the 
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Figure 15. Extraction of significant colors and mapping to dedicated body parts using a simple anthropometric model (source (23|) 



appropriate transform between each camera's color spaces. 
These three appearance descriptors are: (1) mean color, 
(2) covariance matrix of the features: color, 2D position, 
oriented gradients for each channel and (3) MPEG-7 Dom- 
inant Color descriptor. In order to compare the color de- 
scriptors from two cameras, two techniques are presented 
to normalize color space and to improve color constancy. 
The first one (First-Order Normalization) consists in com- 
puting the mean value for each color component (YcbCr) 
over a training set of tracked objects in both cameras and 
to compute the linear transformation between both mean 
values. In the second one (Second Order Normalization), 
the authors consider the possibilities of rotation and trans- 
lation of pixel color values. The term "rotation" means the 
difference of luminance and chromatic channels between 
two cameras. If there is no mixing between the luminance 
and the two chromatic channels, the rotation is not consid- 
ered. The authors have tested these techniques for tracking 
the movement of a pedestrian over a camera network in 
subway stations. The result shows that the application 
of the color normalization techniques does not improve 
significantly the performance of covariance and dominant 
color descriptors. Also, the mean color descriptor brings 
the best result (compared to the two other techniques) 
when it is combined with the second normalization color 
technique. The paper gets some preliminary results on 
evaluation of different descriptors but the authors should 
extend their work on the case of multi-object tracking. 



V. Silhouette Tracking 

Objects may have complex shapes, for example, hands, 
head, and shoulders that cannot be well described by sim- 
ple geometric shapes. Silhouette -based methods provide 
a more accurate shape description for these objects. The 
object model can be in the form of a color histogram or 
the object contour. According to |37|, silhouette trackers 
are divided into two categories, namely, shape matching 
and contour tracking. While the shape matching methods 
search for the object silhouette in the current frame, the 
contour tracking evolves from an initial contour to its new 
position in the current frame by either using the state space 
models or direct minimization of some energy functions. 



A. Contour Tracking 



In 11361 , the authors present an object contour track- 
ing approach using graph cuts based active contours 
(GCBAC). Given an initial boundary near the object in the 
first frame, GCBAC can iteratively converge to an optimal 
object boundary. In each frame thereafter, the resulting 
contour in the previous frame is taken as initialization 
and the algorithm consists in two steps. In the first step, 
GCBAC is applied to the image area which is computed 
by the difference between a frame and its previous one 
to produce a candidate contour. This candidate contour is 
taken as initialization of the second step, which applies 
GCBAC to current frame directly. If the amount of differ- 
ence within a neighbour area of the initial contour is less 
than a predefined threshold, the authors consider that the 
object is not moving and the initial contour is sent directly 
to the second step. So the initialization of the second 
step will be either the contour at the previous frame, or 



the resulting contour of the first step. Figure 17 presents 
this object contour tracking algorithm sketch. By using 
the information gathered from the image difference, this 
approach can remove the background pixels from object 
contour. However, this approach only works effectively 
if the object does not move too fast and/or the object 
does not change a lot in consecutive frames. It means that 
this approach cannot handle the cases of object occlusion. 



Figure 18 presents a head tracking result when the head 



is rotating and translating. 

The authors in ll32l present a contour tracking algorithm 
based on an extended greedy snake technique combined 
with a Kalman filter The contour of a mobile object 
includes a set of control points (called snaxels). Firstly 
the system computes the centroid of an object contour 
by calculating the average value of the coordinates of its 
control points. A contour is represented by its centroid and 
the vectors corresponding to the coordinates of the control 
points relatively to the centroid. The tracking algorithm 
then uses a Kalman filter to estimate the new centroid 
position in the next frame. The new control points are cal- 
culated based on this new centroid, the vectors determined 
in the last frame, the shape scale and the scaling factor A 
new initial contour is also constructed thanks to its new 
control points. After that, the greedy snake technique is 
applied to reconstruct the contour of the mobile object. 



Frame n-1 



Resulting 
contour of- 
frame n-1 



Frame n 




Yes 



Resulting 
contour of 
frame n-1 



Apply GCBAC 

on difference 

image 



Resulting 
contour of 
difference 
image 



V V 



Apply GCBAC 
on frame n 



Resulting 
-contour of 
frame n 



Figure 17. Sketcfi of tlie object contour tracking algoritlim using 
GCBAC (source |36| ) 




(a) Frame 2-5. (b) Frame 2-10. (c) Frame 2-15. 




(d) Frame 2-20. (e) Frame 2-25. (f) Frame 2-30. 



Figure 18. Head tracking result when the head is rotating and translating 
(source 1361 ) 



For each point of the 8 neighbour points of a snaxel, 
the algorithm computes a snake energy value and the 
control point is updated with the neighbour point which 
has minimum energy. The contour is so updated according 
to the new control points. The snake energy includes an 
internal and an external energy. In the internal energy 
there are continuity energy and curvature energy. While 
the internal energy determines the shape of the contour, 
the external energy prevents contour from improper shrink 
or shape change and always holds it close to the target 
boundary. In this paper, the authors use the Kalman filter to 
estimate the position of contour centroid in the next frame. 
The field energy value and the application of the Kalman 
filter are useful for tracking targets with high speed and 
large displacement. However, only three illustrations are 
provided. These illustrations are too simplistic. The object 
to track is black on a white background. A classical color 
segmentation should be able to detect correctly the unique 




Figure 19. Computation of the color and shape based appearance model 
of detected moving blobs (source |2TJ) 



object in the scene. 

B. Shape Matching 

In ll2n . the authors present an approach to compute 
the shape similarity between two detected objects. The 
object shape is described by a Gaussian distribution of 
RGB color of moving pixels and edge points. Given a 
detected moving blob, a reference circle C^ is defined 
as the smallest circle containing the blob. This circle is 
uniformly sampled into a set of control points Pi. For 
each control point Pi, a set of concentric circles of various 
radii are used to define the bins of the appearance model. 
Inside each bin, a Gaussian color model is computed for 
modeling the color properties of the overlapping pixels 
between a circle and the detected blob. Therefore, for 
a given control point Pi we have a one-dimensional 
distribution ^i{Pi). The normalized combination of the 
distributions obtained from each control point Pi defines 
the appearance model of the detected blob: A ~ 'YlliiPi)- 

An illustration of the definition of the appearance model 



is shown in figure 19 where the authors sample the refer- 



ence circle with 8 control points. The 2D shape description 
is obtained by collecting and normalizing corresponding 
edge points for each bin as follows: 



E{j) = 



E.E.iPi) 



^3{Y.^E,{P,)) 



(8) 



where E{j) is the edge distribution for the j*'' radial bin, 
and Ej{Pi) is the number of edge points for the j*'* radial 
bin defined by the i*'' control point Pi. 

The defined model in this approach is invariant for 
translation. Rotation invariance is also guaranteed, since 
a rotation of the blob in the 2D image is equivalent 
to a permutation of the control points. This is achieved 
by taking a larger number of control points along the 
reference circle. Finally, the reference circle defined as 
the smallest circle containing the blob guarantees an 
invariance to scale. This approach is interesting but the 



authors only test with simple video sequences in which 
there are only two moving people. 

VI. Conclusion 

This paper has presented a classification of tracking 
algorithms proposed in [37|. The trackers are divided 
into three categories based on the tracked target: point 
tracking, appearance tracking and silhouette tracking. This 
classification is only relative because there are still many 
tracking algorithms which combine different approaches 
such as between point-based and appearance-based 1 22 1 . 
between contour-based and shape-based ll38l . or between 
contour-based and deterministic-based |fT7l|. Understand- 
ing the advantages and limitations of each tracker category, 
we can select suitable tracker for each concrete video 
scene. 
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